\[~\]
\[~\]
\[~\]
Nowadays, with the improvement of the technologies there are an amount of data that allows us to understand if there is any problem or not in the healthy of a person.
Here in this project we explain and we try to predict whether a patient should be diagnosed with Heart Disease or not.
\[~\]
\[~\]
\[~\]
Kaggle, is the main platform where I found this interesting dataset where applying our main bayesian inference as the main scope of this project.
This dataset is composed by these features:
##
## -- Column specification --------------------------------------------------------
## cols(
## age = col_double(),
## sex = col_double(),
## cp = col_double(),
## trtbps = col_double(),
## chol = col_double(),
## fbs = col_double(),
## restecg = col_double(),
## thalachh = col_double(),
## exng = col_double(),
## oldpeak = col_double(),
## slp = col_double(),
## caa = col_double(),
## thall = col_double(),
## output = col_double()
## )
## # A tibble: 6 x 14
## age sex cp trtbps chol fbs restecg thalachh exng oldpeak slp
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 63 1 3 145 233 1 0 150 0 2.3 0
## 2 37 1 2 130 250 0 1 187 0 3.5 0
## 3 41 0 1 130 204 0 0 172 0 1.4 2
## 4 56 1 1 120 236 0 1 178 0 0.8 2
## 5 57 0 0 120 354 0 1 163 1 0.6 2
## 6 57 1 0 140 192 0 1 148 0 0.4 1
## # ... with 3 more variables: caa <dbl>, thall <dbl>, output <dbl>
\[~\]
Analyzing these features, I recognized that the:
\[~\]
The main features detected are:
\[~\]
## age sex cp trtbps
## Min. :29.00 Min. :0.0000 Min. :0.0000 Min. : 94.0
## 1st Qu.:48.00 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:120.0
## Median :55.50 Median :1.0000 Median :1.0000 Median :130.0
## Mean :54.42 Mean :0.6821 Mean :0.9636 Mean :131.6
## 3rd Qu.:61.00 3rd Qu.:1.0000 3rd Qu.:2.0000 3rd Qu.:140.0
## Max. :77.00 Max. :1.0000 Max. :3.0000 Max. :200.0
## chol fbs restecg thalachh
## Min. :126.0 Min. :0.000 Min. :0.0000 Min. : 71.0
## 1st Qu.:211.0 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:133.2
## Median :240.5 Median :0.000 Median :1.0000 Median :152.5
## Mean :246.5 Mean :0.149 Mean :0.5265 Mean :149.6
## 3rd Qu.:274.8 3rd Qu.:0.000 3rd Qu.:1.0000 3rd Qu.:166.0
## Max. :564.0 Max. :1.000 Max. :2.0000 Max. :202.0
## exng oldpeak slp caa
## Min. :0.0000 Min. :0.000 Min. :0.000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:1.000 1st Qu.:0.0000
## Median :0.0000 Median :0.800 Median :1.000 Median :0.0000
## Mean :0.3278 Mean :1.043 Mean :1.397 Mean :0.7185
## 3rd Qu.:1.0000 3rd Qu.:1.600 3rd Qu.:2.000 3rd Qu.:1.0000
## Max. :1.0000 Max. :6.200 Max. :2.000 Max. :4.0000
## thall output
## Min. :0.000 Min. :0.000
## 1st Qu.:2.000 1st Qu.:0.000
## Median :2.000 Median :1.000
## Mean :2.315 Mean :0.543
## 3rd Qu.:3.000 3rd Qu.:1.000
## Max. :3.000 Max. :1.000
\[~\]
\[~\]
In this subsection we want to describe graphically the dataset that we are analyzing to have a first taste of what is the better model to use in this case, we want to show below some interesting plots
\[~\]
\[~\]
PCA is a particular tool that allows us to show the behaviour of the patients (treated as points in this space) of our data. We see in this way which are the patients more similar to each other
\[~\]
\[~\]
Its normal to denote that there could be some losses on this representation due to the amount of features that we want to consider.
\[~\]
\[~\]
Here, we want to analyze the percentages of each categorical data:
\[~\]
\[~\]
\[~\]
Here, we illustrate the main features of the quantitative data, after have seen the categorical data in the previous section:
\[~\]
hist.and.summary('age', 'Persons Age')
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.00 48.00 55.50 54.42 61.00 77.00
hist.and.summary('chol', 'Cholestoral')
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 126.0 211.0 240.5 246.5 274.8 564.0
hist.and.summary('thalachh', 'Maximum Heart Rate Achieved')
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 71.0 133.2 152.5 149.6 166.0 202.0
hist.and.summary('trtbps', 'Resting Blood Pressure')
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 94.0 120.0 130.0 131.6 140.0 200.0
hist.and.summary('oldpeak', 'Previous Peak')
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.800 1.043 1.600 6.200
\[~\]
After this, we consider also the densities considering the case of having the heart attack and not:
\[~\]
dense.chart('age', 'Persons Age')
dense.chart('chol', 'Cholestoral')
dense.chart('thalachh', 'Maximum Heart Rate Achieved')
dense.chart('trtbps', 'Resting Blood Pressure')
dense.chart('oldpeak', 'Previous Peak')
\[~\]
Here, we want to highlight the correlations between the features treated in qualitative, quantitative and without any distinction.
\[~\]
\[~\]
The main goal is to leverage the main fully Bayesian analysis, based on understanding if a person has a heart attack or not.
So, the response variable \(Y_i\) (that is the “output” feature in our dataset) is the heart attack \(\in \{0,1\}\) and the predictor variables (\(x_i \in \mathbb{R}^{+}\)) have been chosen using the glm() function, used to fit generalized linear models, as we can see below:
\[~\]
##
## Call:
## glm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 +
## x10 + x11 + x12 + x13, family = binomial(), data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5751 -0.3868 0.1514 0.5841 2.6239
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.304169 2.577891 1.282 0.199936
## x1 -0.001469 0.023464 -0.063 0.950062
## x2 -1.750930 0.468199 -3.740 0.000184 ***
## x3 0.847283 0.185548 4.566 4.96e-06 ***
## x4 -0.020188 0.010386 -1.944 0.051916 .
## x5 -0.004489 0.003806 -1.179 0.238252
## x6 0.073463 0.532452 0.138 0.890263
## x7 0.450607 0.348505 1.293 0.196021
## x8 0.023134 0.010449 2.214 0.026835 *
## x9 -0.981017 0.409806 -2.394 0.016672 *
## x10 -0.523604 0.214467 -2.441 0.014630 *
## x11 0.589074 0.349864 1.684 0.092236 .
## x12 -0.826015 0.201922 -4.091 4.30e-05 ***
## x13 -0.887203 0.290730 -3.052 0.002276 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 416.42 on 301 degrees of freedom
## Residual deviance: 210.35 on 288 degrees of freedom
## AIC: 238.35
##
## Number of Fisher Scoring iterations: 6
\[~\]
As we can see above, the glm rejects the hypothesis to consider the variables:
… and also the intercept isn’t a good choice to admit it in our model.
then, in summary we have N = 302 observations. Then we decided to consider two models and see the difference on which model at the end is the best model of our analysis.
\[~\]
\[~\]
Likelihood \(\pi(y_{obs}|\theta)\): measures the goodness of fit of a statistical model to a sample of data for given values of the unknown parameters. It is formed from the joint probability distribution of the sample, but viewed and used as a function of the parameters only, thus treating the random variables as fixed at the observed values.
Logistic regression model: Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
logit link function: it uses the Cumulative Distribution Function (CDF) of the logistic distribution. A benefit to using Logit is that the coefficients can be interpreted in the terms of odds ratios.
cloglog function: Is unlike Logit and Probit asymmetric. It is best used when the probability of an event is very small or very large. The complementary log-log approaches 0 infinitely slower than any other link function. Cloglog model is closely related to continuous-time models for the occurrence of events, so it has an important application in the area of survival analysis and hazard modeling.
Prior \(\pi(\theta)\): is the probability distribution that would express one’s beliefs about this quantity before some evidence is taken into account.
the deviance: is a goodness-of-fit statistic for a statistical model; it is often used for statistical hypothesis testing. It is a generalization of the idea of using the sum of squares of residuals (RSS) in ordinary least squares to cases where model-fitting is achieved by maximum likelihood.
\[~\]
\[~\]
In the first model, we decide to consider the following logistic regression model with the logit link function:
\[ Y_i \sim Bern(p_i) \\ logit(p_i) = \beta_{2} \cdot x_{2_i} + \beta_{3} \cdot x_{3_i} + \beta_{4} \cdot x_{4_i} +\beta_{8} \cdot x_{8_i} + \beta_{9} \cdot x_{9_i} + \beta_{10} \cdot x_{10_i} + \beta_{11} \cdot x_{11_i} + \beta_{12} \cdot x_{12_i} + \beta_{13} \cdot x_{13_i} \] The prior beta parameters are chosen approximatly considering the estimated averages saw in the previous subsection regarding the outcomes of glm(). So, the beta prior parameters are distributed following a normal distribution as follows:
\[ \beta_2 \sim N(\mu_2 = -1, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_3 \sim N(\mu_3 = 0.5, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_4 \sim N(\mu_4 = 0, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_8 \sim N(\mu_8 = 0, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_9 \sim N(\mu_9 = -0.5, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_{10} \sim N(\mu_{10} = 0.5, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_{11} \sim N(\mu_{11} = 0.5, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_{12} \sim N(\mu_{12} = -0.5, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_{13} \sim N(\mu_{13} = -0.5, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \]
The main reason is that we prefer chosing the binary classification as in this case the logistic regression to model our outcomes based only on 0 or 1.
\[~\]
\[~\]
As we can see, we implemented the model using RJags:
\[~\]
\[~\]
model <- function(){
# Likelihood
for (i in 1:N){
y[i] ~ dbern(p[i])
logit(p[i]) <- beta2*x2[i] + beta3*x3[i] + beta4*x4[i] + beta8*x8[i] + beta9*x9[i] + beta10*x10[i] + beta11*x11[i] + beta12*x12[i] + beta13*x13[i] # omit intercept and not useful variables
}
# Defining the prior beta parameters
beta2 ~ dnorm(-1, 1.0E-6)
beta3 ~ dnorm(0.5, 1.0E-6)
beta4 ~ dnorm(0, 1.0E-6)
beta8 ~ dnorm(0, 1.0E-6)
beta9 ~ dnorm(-0.5, 1.0E-6)
beta10 ~ dnorm(0.5, 1.0E-6)
beta11 ~ dnorm(0.5, 1.0E-6)
beta12 ~ dnorm(-0.5, 1.0E-6)
beta13 ~ dnorm(-0.5, 1.0E-6)
}
remark: that the standard deviation in RJags is considered as the precision, so the \(\tau^2 = \frac{1}{\sigma^{2}}\)
\[~\]
# Passing the data for RJags
data.jags <- list("y" = y, "N" = N,
"x2" = x2, "x3" = x3, "x4" = x4, "x8" = x8, "x9" = x9, "x10" = x10, "x11" = x11, "x12" = x12, "x13" = x13)
# Defining parameters of interest to show after running RJags
mod.params <- c("beta2", "beta3", "beta4", "beta8", "beta9", "beta10", "beta11", "beta12", "beta13")
\[~\]
# Run JAGS
set.seed(123)
n.chains <- 3
mod.fit <- jags(data = data.jags, # DATA
model.file = model, # MODEL
parameters.to.save = mod.params, # TRACKING
n.chains = n.chains, n.iter = 10000, n.burnin = 1000, n.thin=10) # MCMC
## module glm loaded
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 302
## Unobserved stochastic nodes: 9
## Total graph size: 3840
##
## Initializing model
mod.fit
## Inference for Bugs model at "C:/Users/Jeremy/AppData/Local/Temp/RtmpknYfOZ/model563036061b32.txt", fit using jags,
## 3 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
## n.sims = 2700 iterations saved
## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
## beta10 -0.489 0.212 -0.909 -0.628 -0.490 -0.344 -0.081 1.001 2700
## beta11 0.683 0.344 0.021 0.440 0.687 0.909 1.369 1.001 2700
## beta12 -0.846 0.199 -1.252 -0.977 -0.843 -0.711 -0.471 1.002 1900
## beta13 -0.851 0.277 -1.401 -1.038 -0.842 -0.659 -0.332 1.001 2700
## beta2 -1.652 0.445 -2.530 -1.955 -1.640 -1.356 -0.810 1.001 2700
## beta3 0.881 0.187 0.524 0.756 0.879 1.006 1.253 1.002 2700
## beta4 -0.015 0.008 -0.032 -0.020 -0.015 -0.009 0.001 1.002 1700
## beta8 0.031 0.008 0.016 0.026 0.031 0.037 0.047 1.002 1600
## beta9 -0.889 0.414 -1.688 -1.173 -0.895 -0.605 -0.069 1.001 2700
## deviance 225.361 4.463 218.784 222.067 224.697 227.789 235.802 1.001 2700
##
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
##
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 10.0 and DIC = 235.3
## DIC is an estimate of expected predictive error (lower deviance is better).
\[~\]
In this last running of the MCMC we have interesting results:
mu.vect and sd.vect: these values representing the estimated values after simulating the MCMC.
the percentages (quantiles): represents the quantiles of the interval where the parameter is likely in
RHat: The values near 1 suggest convergence
n.eff: is the effective number of samples to achieve the convergence, the stationary region.
\[~\]
\[~\]
Here, we show the univariate trace-plots of the simulations of each parameter:
\[~\]
\[~\] In this section we want to see the empirical average of \(\mathbf{\hat{I}}_{t}\) increasing the value of \(t \, = \, 1,...,T\). Before starting, we want to write the formula of \(\mathbf{\hat{I}}_{t}\) that is:
\[~\]
\[ \mathbf{I} \approx \mathbf{\hat{I}}_{t} = \frac{1}{T} \sum_{i=1}^{T} h(\theta^{(i)}) \]
\[~\]
is important to write \(\approx\) because this leverages the SLLN theorem and we want also to confirm this assumption! So, let’s see the results:
\[~\]
\[~\]
\[~\]
We are considering other interesting plots, underlining also each chain alone:
\[~\] Now, we want to see the autocorrelations of each parameter:
## beta10 beta11 beta12 beta13 beta2
## Lag 0 1.000000000 1.000000000 1.000000000 1.000000000 1.000000000
## Lag 10 0.028749715 0.013027320 0.011401177 0.008061257 0.009605275
## Lag 50 0.004853036 0.007741027 0.006044363 -0.018532707 0.032820945
## Lag 100 -0.062010722 -0.017318088 0.005858652 0.022871691 0.044325748
## Lag 500 0.022963511 0.034443073 -0.006998113 0.004928576 0.002334635
## beta3 beta4 beta8 beta9 deviance
## Lag 0 1.00000000 1.000000000 1.00000000 1.000000000 1.000000e+00
## Lag 10 0.02486498 0.001619878 0.04591646 -0.004177052 -1.524460e-02
## Lag 50 0.02580283 -0.048039671 -0.04290369 -0.023810367 -9.722008e-03
## Lag 100 0.02351930 -0.011874447 -0.01503457 -0.028663890 -9.939696e-05
## Lag 500 0.03144189 0.021556527 0.01383545 0.018783666 1.420195e-02
\[~\] The ACFs follow good behaviours, because we need to have samples independent each other during new iterations. Here, we see that the correlations are going further from the first lag, this is strictly decreasing. So, this is a good point!
We can conferm our intuition also with the values returned by autocorr.diag() that it returned the vector of the average autocorrelations across all chains.
\[~\]
\[~\] Here, we want to analyze the best approximation error for the MCMC sampling. We consider, essentially the square root of the MCSE.
The variance formula in the MCMC samplig is:
\[ \mathbf{V}[\hat{I}_{t}] = \frac{Var_{\pi}[h(X_{1})]}{T_{eff}} = \Big( 1 + 2 \sum_{k=1}^{\infty} \rho_{k}\Big)\frac{\sigma^{2}}{T} \] in this case we want to consider the MCSE that is the square root of the formula written above, So the results are (for each chain):
\[~\]
n <- length(colnames(mod.fit$BUGSoutput$sims.matrix))
mcse_dataframe <- data.frame(MCSE_Chain1 = rep(NA, n), MCSE_Chain2 = rep(NA, n), MCSE_Chain3 = rep(NA, n), MCSE_mean = rep(NA, n))
rownames(mcse_dataframe) <- colnames(mod.fit$BUGSoutput$sims.matrix)[1:n]
for(colname in colnames(mod.fit$BUGSoutput$sims.matrix)[1:n]){
for(j in 1:n.chains)
mcse_dataframe[colname, j] <- LaplacesDemon::MCSE(mod.fit$BUGSoutput$sims.array[, j, colname])
mcse_dataframe[colname, "MCSE_mean"] <- mean(unlist(mcse_dataframe[colname, c(1:3)]))
}
mcse_dataframe
## MCSE_Chain1 MCSE_Chain2 MCSE_Chain3 MCSE_mean
## beta10 0.0073261667 0.0079692464 0.0077954121 0.0076969417
## beta11 0.0115131110 0.0118303126 0.0120922370 0.0118118869
## beta12 0.0065660674 0.0073391536 0.0064715431 0.0067922547
## beta13 0.0100578467 0.0090567755 0.0094117785 0.0095088003
## beta2 0.0145511257 0.0141811879 0.0177755410 0.0155026182
## beta3 0.0070692107 0.0063147178 0.0072267372 0.0068702219
## beta4 0.0002681413 0.0003043189 0.0002726522 0.0002817041
## beta8 0.0002727964 0.0003043320 0.0002715205 0.0002828829
## beta9 0.0129126027 0.0146421457 0.0146747039 0.0140764841
## deviance 0.1550249861 0.1362655438 0.1552283978 0.1488396426
\[~\]
\[~\] Here, we want to understand if we achieve with these multiple simulations of the markov chains the convergences. We are considering one of the tools saw in the last lectures, to understand if we achieve or not the convergency this is more powerful than the other tools.
\[~\]
\[~\] Implements a convergence diagnostic, and removes up to half the chain in order to ensure that the means are estimated from a chain that has converged.
The convergence test uses the Cramer-von-Mises statistic to test the null hypothesis that the sampled values come from a stationary distribution.
The test is successively applied, firstly to the whole chain, then after discarding the first 10%, 20%, … of the chain until either the null hypothesis is accepted, or 50% of the chain has been discarded.
The latter outcome constitutes `failure’ of the stationarity test and indicates that a longer MCMC run is needed. If the stationarity test is passed, the number of iterations to keep and the number to discard are reported.
The half-width test calculates a 95% confidence interval for the mean, using the portion of the chain which passed the stationarity test.
Half the width of this interval is compared with the estimate of the mean. If the ratio between the half-width and the mean is lower than eps, the halfwidth test is passed. Otherwise the length of the sample is deemed not long enough to estimate the mean with sufficient accuracy.
\[~\]
## [[1]]
##
## Stationarity start p-value
## test iteration
## beta10 passed 1 0.3239
## beta11 passed 1 0.6880
## beta12 passed 1 0.2287
## beta13 passed 1 0.7323
## beta2 passed 1 0.7564
## beta3 passed 1 0.7176
## beta4 passed 1 0.0622
## beta8 passed 1 0.2670
## beta9 passed 1 0.6451
## deviance passed 1 0.1120
##
## Halfwidth Mean Halfwidth
## test
## beta10 passed -0.4835 0.014111
## beta11 passed 0.6757 0.020904
## beta12 passed -0.8545 0.012844
## beta13 passed -0.8541 0.017958
## beta2 passed -1.6552 0.029211
## beta3 passed 0.8795 0.012322
## beta4 passed -0.0147 0.000559
## beta8 passed 0.0316 0.000491
## beta9 passed -0.8927 0.025430
## deviance passed 225.2703 0.297930
##
## [[2]]
##
## Stationarity start p-value
## test iteration
## beta10 passed 1 0.233
## beta11 passed 1 0.254
## beta12 passed 1 0.855
## beta13 passed 1 0.784
## beta2 passed 1 0.699
## beta3 passed 1 0.153
## beta4 passed 1 0.642
## beta8 passed 1 0.812
## beta9 passed 1 0.423
## deviance passed 1 0.769
##
## Halfwidth Mean Halfwidth
## test
## beta10 passed -0.4928 0.013672
## beta11 passed 0.6839 0.022893
## beta12 passed -0.8391 0.013463
## beta13 passed -0.8527 0.016625
## beta2 passed -1.6389 0.028610
## beta3 passed 0.8842 0.011386
## beta4 passed -0.0149 0.000536
## beta8 passed 0.0315 0.000566
## beta9 passed -0.8891 0.027043
## deviance passed 225.3264 0.268161
##
## [[3]]
##
## Stationarity start p-value
## test iteration
## beta10 passed 1 0.3448
## beta11 passed 1 0.1388
## beta12 passed 1 0.6863
## beta13 passed 1 0.1368
## beta2 passed 1 0.0714
## beta3 passed 1 0.3191
## beta4 passed 1 0.5506
## beta8 passed 1 0.8238
## beta9 passed 1 0.8632
## deviance passed 1 0.5795
##
## Halfwidth Mean Halfwidth
## test
## beta10 passed -0.4906 0.013692
## beta11 passed 0.6899 0.022293
## beta12 passed -0.8431 0.012653
## beta13 passed -0.8475 0.018361
## beta2 passed -1.6604 0.031817
## beta3 passed 0.8795 0.014677
## beta4 passed -0.0142 0.000546
## beta8 passed 0.0310 0.000524
## beta9 passed -0.8847 0.031369
## deviance passed 225.4861 0.294261
\[~\] We prefer considering the Heidel test because it is a double check of the stationarity and the convergency states. This is sometimes failed, meanwhile the other tools seems to be good, so I prefer considering this test.
Here, we can see all is considered correctly, so our intuitions are correct!
\[~\]
\[~\]
Here, we want to illustrate how using the markov chain to approximate the posterior predictive distribution of the people that could have the heart attack or not considering the features explained above. So, we want to approximate this area:
\[ m(y_{new}) = \int_{\Theta} f(y_{new}|\theta, Y_{old}, x_2, x_3, x_4, x_8, x_9, x_{10}, x_{11}, x_{12}, x_{13})\pi(\theta)d\theta \]
where the \(\theta\) is equal to the coefficients used and considered before, \(\Theta\) is the parameter space and \(\pi(\theta)\) is the product of the priors thanks to marginal independence assumption. At the end, we want to smaple from the \(\pi(\theta)\) and from \(f(y_{new}|\theta, Y_{old}, x_2, x_3, x_4, x_8, x_9, x_{10}, x_{11}, x_{12}, x_{13})\).
Here, there is a brief practical example:
new_record <- list(x2 = sample(x2, 1, replace = T), x3 = sample(x3, 1, replace = T), x4 = sample(x4, 1, replace = T), x8 = sample(x8, 1, replace = T), x9 = sample(x9, 1, replace = T),
x10 = sample(x10, 1, replace = T), x11 = sample(x11, 1, replace = T), x12 = sample(x12, 1, replace = T), x13 = sample(x13, 1, replace = T))
x <- mod.fit$BUGSoutput$sims.matrix[,"beta2"]*new_record$x2 + mod.fit$BUGSoutput$sims.matrix[,"beta3"]*new_record$x3 + mod.fit$BUGSoutput$sims.matrix[,"beta4"]*new_record$x4 + mod.fit$BUGSoutput$sims.matrix[,"beta8"]*new_record$x8 + mod.fit$BUGSoutput$sims.matrix[,"beta9"]*new_record$x9 + mod.fit$BUGSoutput$sims.matrix[,"beta10"]*new_record$x10 + mod.fit$BUGSoutput$sims.matrix[,"beta11"]*new_record$x11 + mod.fit$BUGSoutput$sims.matrix[,"beta12"]*new_record$x12 + mod.fit$BUGSoutput$sims.matrix[,"beta13"]*new_record$x13
x_mu <- 1/(1+exp(-x))
y_pred <- unlist(lapply(x_mu, function(x) rbinom(n = 1, size = 1, prob = x)))
sd(y_pred)
## [1] 0.3478183
prop.table(table(y_pred))
## y_pred
## 0 1
## 0.8592593 0.1407407
prop.table(table(y))
## y
## 0 1
## 0.4569536 0.5430464
summary(y_pred)[4]
## Mean
## 0.1407407
hchart(y_pred, type = "column", name = "model1", color = randomcoloR::randomColor()) %>%
hc_title(text = "Predictions with model 1") %>%
hc_xAxis(title = "model1") %>%
hc_chart(options3d=list(enabled=TRUE, alpha=2, beta=-10,
depth=100, viewDistance=25)) %>%
hc_plotOptions(column=list(depth= 100))
After read, the whole project report we will see which is the best model here in terms of predictions, stay tuned!
\[~\]
\[~\]
Here, we bring the correlations between all the values calculated during the MCMC sampling, considering the joining matrix:
\[~\]
\[~\] Now, we want to focus to another model, that is at the same time the logistic regression, but here we changed the link function, now has you can see in the previous chapters is the cloglog function.
Why this changment? Because, we want to see which model is better… and how can you see this? I’ll tell you where you can easy see this better preference.
What is the link function? So, the link function transforms the probabilities of the levels of a categorical response variable to a continuous scale that is unbounded. Once the transformation is complete, the relationship between the predictors and the response can be modeled with the logistic regression for example.
As we can see we will reproduce this second model changing the link function to see if it is better or not than the previous model:
\[~\]
## module glm loaded
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 302
## Unobserved stochastic nodes: 9
## Total graph size: 3840
##
## Initializing model
## Inference for Bugs model at "C:/Users/Jeremy/AppData/Local/Temp/Rtmp42sAgQ/model62a433132292.txt", fit using jags,
## 3 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
## n.sims = 2700 iterations saved
## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
## beta10 -0.494 0.211 -0.930 -0.629 -0.489 -0.356 -0.077 1.000 2700
## beta11 0.682 0.344 -0.008 0.453 0.682 0.909 1.343 1.001 2700
## beta12 -0.852 0.202 -1.255 -0.988 -0.849 -0.714 -0.464 1.000 2700
## beta13 -0.846 0.270 -1.390 -1.026 -0.843 -0.663 -0.335 1.001 2700
## beta2 -1.643 0.443 -2.538 -1.931 -1.622 -1.340 -0.807 1.001 2700
## beta3 0.873 0.190 0.508 0.743 0.875 0.999 1.258 1.003 2600
## beta4 -0.014 0.008 -0.031 -0.020 -0.014 -0.009 0.001 1.001 2500
## beta8 0.031 0.008 0.016 0.026 0.031 0.036 0.047 1.001 2700
## beta9 -0.882 0.414 -1.679 -1.161 -0.883 -0.603 -0.090 1.001 2500
## deviance 225.326 4.559 218.528 222.066 224.650 227.855 236.465 1.004 1000
##
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
##
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 10.4 and DIC = 235.7
## DIC is an estimate of expected predictive error (lower deviance is better).
\[~\] Can you see something different? Seems yes.. see the DIC! Is better the previous model (although the difference is too tiny, but the previous model is better).
…mmh, what is the DIC?
\[~\]
\[~\] As we can see above, we tried to make a different model only changing the link function to see if it is better to associate with our data, the DIC (Deviance information criterion) is our indicator that says to us which is the better model.
The deviance information criterion (DIC) is a hierarchical modeling generalization of the Akaike information criterion (AIC). It is particularly useful in Bayesian model selection problems where the posterior distributions of the models have been obtained by Markov chain Monte Carlo (MCMC) simulation. DIC is an asymptotic approximation as the sample size becomes large, like AIC.
the DIC is calculated as:
\[ DIC = p_D + \overline{D(\theta)} \] where:
The larger the effective number of parameters is, the easier it is for the model to fit the data, and so the deviance needs to be penalized.
Lower is the DIC value better is the accuracy of the model, in this case is better the first model.
\[~\]
\[~\]
\[~\]
\[~\]